Goto

Collaborating Authors

 relative entropy






Empirical Risk Minimization with $f$-Divergence Regularization

Daunas, Francisco, Esnaola, Iñaki, Perlaza, Samir M., Poor, H. Vincent

arXiv.org Machine Learning

In this paper, the solution to the empirical risk minimization problem with $f$-divergence regularization (ERM-$f$DR) is presented and conditions under which the solution also serves as the solution to the minimization of the expected empirical risk subject to an $f$-divergence constraint are established. The proposed approach extends applicability to a broader class of $f$-divergences than previously reported and yields theoretical results that recover previously known results. Additionally, the difference between the expected empirical risk of the ERM-$f$DR solution and that of its reference measure is characterized, providing insights into previously studied cases of $f$-divergences. A central contribution is the introduction of the normalization function, a mathematical object that is critical in both the dual formulation and practical computation of the ERM-$f$DR solution. This work presents an implicit characterization of the normalization function as a nonlinear ordinary differential equation (ODE), establishes its key properties, and subsequently leverages them to construct a numerical algorithm for approximating the normalization factor under mild assumptions. Further analysis demonstrates structural equivalences between ERM-$f$DR problems with different $f$-divergences via transformations of the empirical risk. Finally, the proposed algorithm is used to compute the training and test risks of ERM-$f$DR solutions under different $f$-divergence regularizers. This numerical example highlights the practical implications of choosing different functions $f$ in ERM-$f$DR problems.


Optimal Anytime-Valid Tests for Composite Nulls

Shekhar, Shubhanshu

arXiv.org Machine Learning

We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\} \overset{i.i.d.}{\sim} P$, the goal is to design a level-$α$ power-one test $τ_α$ for the null $H_0: P \in \mathcal{P}_0 \subset \mathcal{P}(\mathcal{X})$. Prior works have shown that any such $τ_α$ must satisfy $\mathbb{E}_P[τ_α] \geq \tfrac{\log(1/α)}{γ^*(P, \mathcal{P}_0)}$, where $γ^*(P, \mathcal{P}_0)$ is the so-called $\mathrm{KL}_{\inf}$ or minimum divergence of $P$ to the null class. In this paper, our objective is to develop and analyze constructive schemes that match this lower bound as $α\downarrow 0$. We first consider the finite-alphabet case~($|\mathcal{X}| = m < \infty$), and show that a test based on \emph{universal} $e$-process~(formed by the ratio of a universal predictor and the running null MLE) is optimal in the above sense. The proof relies on a Donsker-Varadhan~(DV) based saddle-point representation of $\mathrm{KL}_{\inf}$, and an application of Sion's minimax theorem. This characterization motivates a general method for arbitrary $\mathcal{X}$: construct an $e$-process based on the empirical solutions to the saddle-point representation over a sufficiently rich class of test functions. We give sufficient conditions for the optimality of this test for compact convex nulls, and verify them for Hölder smooth density models. We end the paper with a discussion on the computational aspects of implementing our proposed tests in some practical settings.


Performance Guarantees for Quantum Neural Estimation of Entropies

Sreekumar, Sreejith, Goldfeld, Ziv, Wilde, Mark M.

arXiv.org Artificial Intelligence

Estimating quantum entropies and divergences is an important problem in quantum physics, information theory, and machine learning. Quantum neural estimators (QNEs), which utilize a hybrid classical-quantum architecture, have recently emerged as an appealing computational framework for estimating these measures. Such estimators combine classical neural networks with parametrized quantum circuits, and their deployment typically entails tedious tuning of hyperparameters controlling the sample size, network architecture, and circuit topology. This work initiates the study of formal guarantees for QNEs of measured (Rényi) relative entropies in the form of non-asymptotic error risk bounds. We further establish exponential tail bounds showing that the error is sub-Gaussian, and thus sharply concentrates about the ground truth value. For an appropriate sub-class of density operator pairs on a space of dimension $d$ with bounded Thompson metric, our theory establishes a copy complexity of $O(|Θ(\mathcal{U})|d/ε^2)$ for QNE with a quantum circuit parameter set $Θ(\mathcal{U})$, which has minimax optimal dependence on the accuracy $ε$. Additionally, if the density operator pairs are permutation invariant, we improve the dimension dependence above to $O(|Θ(\mathcal{U})|\mathrm{polylog}(d)/ε^2)$. Our theory aims to facilitate principled implementation of QNEs for measured relative entropies and guide hyperparameter tuning in practice.


Quantum Information Ordering and Differential Privacy

Dasgupta, Ayanava, Warsi, Naqueeb Ahmad, Hayashi, Masahito

arXiv.org Artificial Intelligence

We study quantum differential privacy (QDP) by defining a notion of the order of informativeness between two pairs of quantum states. In particular, we show that if the hypothesis testing divergence of the one pair dominates over that of the other pair, then this dominance holds for every f -divergence. This approach completely characterizes (ε,δ)-QDP mechanisms by identifying the most informative (ε,δ)-DP quantum state pairs. We apply this to analyze the stability of quantum differentially private learning algorithms, generalizing classical results to the case δ > 0. Additionally, we study precise limits for privatized hypothesis testing and privatized quantum parameter estimation, including tight upper-bounds on the quantum Fisher information under QDP . Finally, we establish near-optimal contraction bounds for differentially private quantum channels with respect to the hockey-stick divergence. I. Introduction A fundamental challenge in modern machine learning is the trade-off between privacy and information extraction. In this work, we explicitly treat both sides: privacy (ensuring that algorithmic outputs do not reveal significant information about the input data of the respondents) and the investigator's goal to extract as much useful information as possible from data for accurate learning and estimation. With the rapid advancement of machine learning, a key concern is about ensuring the privacy of learning algorithms, meaning that their outputs should not reveal significant information about the input data. Differential privacy (DP) provides a rigorous mathematical framework to balance these opposing requirements. Accordingly, we structure our contributions in three steps: first step (privacy), second step (information extraction under privacy constraints), and third step, the quantum channel setup, where the situation is more complicated, and we mark the transition to each step explicitly in the text. This step develops the privacy side of the trade-off from the respondent's perspective by studying the stability [1], [2] of learning algorithms. From the respondent's viewpoint, privacy means that the inclusion or exclusion of their individual data should not materially affect the mechanism's output, so that they can contribute data without fear of singled-out inference. An algorithm is considered stable if its output does not change drastically when a single respondent's data is changed; this point-wise insensitivity is precisely the respondent-centric guarantee we seek.


Contraction and entropy production in continuous-time Sinkhorn dynamics

Srinivasan, Anand, Slotine, Jean-Jacques

arXiv.org Machine Learning

Recently, the vanishing-step-size limit of the Sinkhorn algorithm at finite regularization parameter $\varepsilon$ was shown to be a mirror descent in the space of probability measures. We give $L^2$ contraction criteria in two time-dependent metrics induced by the mirror Hessian, which reduce to the coercivity of certain conditional expectation operators. We then give an exact identity for the entropy production rate of the Sinkhorn flow, which was previously known only to be nonpositive. Examining this rate shows that the standard semigroup analysis of diffusion processes extends systematically to the Sinkhorn flow. We show that the flow induces a reversible Markov dynamics on the target marginal as an Onsager gradient flow. We define the Dirichlet form associated to its (nonlocal) infinitesimal generator, prove a Poincaré inequality for it, and show that the spectral gap is strictly positive along the Sinkhorn flow whenever $\varepsilon > 0$. Lastly, we show that the entropy decay is exponential if and only if a logarithmic Sobolev inequality (LSI) holds. We give for illustration two immediate practical use-cases for the Sinkhorn LSI: as a design principle for the latent space in which generative models are trained, and as a stopping heuristic for discrete-time algorithms.


Compression with Bayesian Implicit Neural Representations

Neural Information Processing Systems

Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the β -ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting β . Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.